A storyboard is a roadmap for video creation which consists of shot-by-shot images to visualize key plots in a text synopsis. Creating video storyboards however remains challenging which not only requires association between high-level texts and images, but also demands for long-term reasoning to make transitions smooth across shots. In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images to visualize the text synopsis. We construct a MovieNet-TeViS benchmark based on the public MovieNet dataset. It contains 10K text synopses each paired with keyframes that are manually selected from corresponding movies by considering both relevance and cinematic coherence. We also present an encoder-decoder baseline for the task. The model uses a pretrained vision-and-language model to improve high-level text-image matching. To improve coherence in long-term shots, we further propose to pre-train the decoder on large-scale movie frames without text. Experimental results demonstrate that our proposed model significantly outperforms other models to create text-relevant and coherent storyboards. Nevertheless, there is still a large gap compared to human performance suggesting room for promising future work.
translated by 谷歌翻译
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs). CoT explicitly encourages the LLM to generate intermediate rationales for solving a problem, by providing a series of reasoning steps in the demonstrations. Despite its success, there is still little understanding of what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that CoT reasoning is possible even with invalid demonstrations - prompting with invalid reasoning steps can achieve over 80-90% of the performance obtained using CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are much more important for effective CoT reasoning. Overall, these findings both deepen our understanding of CoT prompting, and open up new questions regarding LLMs' capability to learn to reason in context.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Deep neural networks (DNNs) recently emerged as a promising tool for analyzing and solving complex differential equations arising in science and engineering applications. Alternative to traditional numerical schemes, learning-based solvers utilize the representation power of DNNs to approximate the input-output relations in an automated manner. However, the lack of physics-in-the-loop often makes it difficult to construct a neural network solver that simultaneously achieves high accuracy, low computational burden, and interpretability. In this work, focusing on a class of evolutionary PDEs characterized by having decomposable operators, we show that the classical ``operator splitting'' numerical scheme of solving these equations can be exploited to design neural network architectures. This gives rise to a learning-based PDE solver, which we name Deep Operator-Splitting Network (DOSnet). Such non-black-box network design is constructed from the physical rules and operators governing the underlying dynamics contains learnable parameters, and is thus more flexible than the standard operator splitting scheme. Once trained, it enables the fast solution of the same type of PDEs. To validate the special structure inside DOSnet, we take the linear PDEs as the benchmark and give the mathematical explanation for the weight behavior. Furthermore, to demonstrate the advantages of our new AI-enhanced PDE solver, we train and validate it on several types of operator-decomposable differential equations. We also apply DOSnet to nonlinear Schr\"odinger equations (NLSE) which have important applications in the signal processing for modern optical fiber transmission systems, and experimental results show that our model has better accuracy and lower computational complexity than numerical schemes and the baseline DNNs.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Federated learning (FL) is a collaborative machine learning framework that requires different clients (e.g., Internet of Things devices) to participate in the machine learning model training process by training and uploading their local models to an FL server in each global iteration. Upon receiving the local models from all the clients, the FL server generates a global model by aggregating the received local models. This traditional FL process may suffer from the straggler problem in heterogeneous client settings, where the FL server has to wait for slow clients to upload their local models in each global iteration, thus increasing the overall training time. One of the solutions is to set up a deadline and only the clients that can upload their local models before the deadline would be selected in the FL process. This solution may lead to a slow convergence rate and global model overfitting issues due to the limited client selection. In this paper, we propose the Latency awarE Semi-synchronous client Selection and mOdel aggregation for federated learNing (LESSON) method that allows all the clients to participate in the whole FL process but with different frequencies. That is, faster clients would be scheduled to upload their models more frequently than slow clients, thus resolving the straggler problem and accelerating the convergence speed, while avoiding model overfitting. Also, LESSON is capable of adjusting the tradeoff between the model accuracy and convergence rate by varying the deadline. Extensive simulations have been conducted to compare the performance of LESSON with the other two baseline methods, i.e., FedAvg and FedCS. The simulation results demonstrate that LESSON achieves faster convergence speed than FedAvg and FedCS, and higher model accuracy than FedCS.
translated by 谷歌翻译
Code pre-trained models (CodePTMs) have recently demonstrated significant success in code intelligence. To interpret these models, some probing methods have been applied. However, these methods fail to consider the inherent characteristics of codes. In this paper, to address the problem, we propose a novel probing method CAT-probing to quantitatively interpret how CodePTMs attend code structure. We first denoise the input code sequences based on the token types pre-defined by the compilers to filter those tokens whose attention scores are too small. After that, we define a new metric CAT-score to measure the commonality between the token-level attention scores generated in CodePTMs and the pair-wise distances between corresponding AST nodes. The higher the CAT-score, the stronger the ability of CodePTMs to capture code structure. We conduct extensive experiments to integrate CAT-probing with representative CodePTMs for different programming languages. Experimental results show the effectiveness of CAT-probing in CodePTM interpretation. Our codes and data are publicly available at https://github.com/nchen909/CodeAttention.
translated by 谷歌翻译
TOR(洋葱路由器)网络是一种广泛使用的开源匿名通信工具,滥用Tor使得很难监视在线犯罪的扩散,例如访问犯罪网站。大多数现有的TOR网络去匿名化的批准都在很大程度上依赖手动提取的功能,从而导致耗时和性能差。为了解决这些缺点,本文提出了一种神经表示方法,以根据分类算法识别网站指纹。我们构建了一个基于卷积神经网络(CNN)的新网站指纹攻击模型,并通过扩张和因果卷积,可以改善CNN的感知场并捕获输入数据的顺序特征。三个主流公共数据集的实验表明,与最先进的方法相比,提出的模型对网站指纹分类非常有效且有效,并将准确性提高了12.21%。
translated by 谷歌翻译
深度神经网络(DNNS)在训练过程中容易受到后门攻击的影响。该模型以这种方式损坏正常起作用,但是当输入中的某些模式触发时,会产生预定义的目标标签。现有防御通常依赖于通用后门设置的假设,其中有毒样品共享相同的均匀扳机。但是,最近的高级后门攻击表明,这种假设在动态后门中不再有效,在动态后门中,触发者因输入而异,从而击败了现有的防御。在这项工作中,我们提出了一种新颖的技术BEATRIX(通过革兰氏矩阵检测)。 BEATRIX利用革兰氏矩阵不仅捕获特征相关性,还可以捕获表示形式的适当高阶信息。通过从正常样本的激活模式中学习类条件统计,BEATRIX可以通过捕获激活模式中的异常来识别中毒样品。为了进一步提高识别目标标签的性能,BEATRIX利用基于内核的测试,而无需对表示分布进行任何先前的假设。我们通过与最先进的防御技术进行了广泛的评估和比较来证明我们的方法的有效性。实验结果表明,我们的方法在检测动态后门时达到了91.1%的F1得分,而最新技术只能达到36.9%。
translated by 谷歌翻译
引号提取旨在从书面文本中提取引号。引号中有三个组成部分:来源是指引号的持有人,提示是触发词,内容是主体。引号提取的现有解决方案主要利用基于规则的方法和序列标签模型。尽管基于规则的方法通常会导致召回率低,但序列标记模型不能很好地处理带有复杂结构的报价。在本文中,我们提出了上下文和以前的标签增强净(Cofenet),以提取引号。Cofenet能够提取具有可变长度和复杂结构的组成部分的复杂报价。在两个公共数据集(即polnear和Riqua)和一个专有数据集(即Politicszh)上,我们表明我们的Cofenet在复杂的引号提取方面取得了最先进的表现。
translated by 谷歌翻译